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HARDWARE INSTRUCTION TRANSLATION WITHIN A PROCESSOR PIPELINE 

This invention relates to data processing systems. More particularly, this invention 
relates to data processing systems in which instruction translation from one instruction set to 
5 another instruction set occurs within a processor pipeline. 



It is known to provide processing systems in which instruction translation from a first 
instruction set to a second instruction set takes place within the instruction pipeline. In these 
systems each instruction to be translated maps to a single native instruction. An example of 
10 such systems are the processors produced by ARM Limited that support both ARM and 
Thumb instruction codes. 

It is also known to provide processing systems in which non-native instructions may 
be translated into native instruction sequences comprising multiple native instructions. An 
15 example of such a system is described in US-A-5,937,193. This system maps Java bytecodes 
to 32-bit ARM instructions. The translation takes place before the instructions are passed into 
the processor pipeline and utilises memory address remapping techniques. A Java bytecode is 
used to look up a sequence of ARM instructions in a memory that then emulate the action of 
the Java bytecode. 

20 

The system of US-A-5,937,193 has several associated disadvantages. Such a system 
is inefficient in the way it utilises memory and memory fetches. The ARM instruction 
sequences all occupy the same amount of memory space even if they could be arranged to 
occupy less. Multiple fetches of ARM instructions from memory are required upon the 
25 decoding of each Java bytecode which disadvantageous^ consumes power and 
disadvantageous^ impacts performance. The translated instruction sequences are fixed 
making it difficult to take account of what may be different starting system states when 
executing each Java bytecode that could result in different, or better optimised, instruction 
translations. 

30 

Examples of known systems for translation between instruction sets and other 
background information may be found in the following: US-A-5,805,895; US-A-3,955,180; 
US-A-5,970,242; US-A-5,6 19,665; US-A-5,826,089; US-A-5,925,123; US-A-5,875,336; US- 
A-5,937,193; US-A-5,953,520; US-A-6,02 1,469; US-A-5,568,646; US-A-5,758,1 15; IBM 
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, noo ™« tnQ "Svstem/370 Emulator Assist 
Technical Disclosure --r^^Xc^BUo^eBuUe.i.M, 
Processor For a Reduced Instruction Set Computer , IBM 
m , ,, Q .. Ful l Function Series/1 Instruction Set Emulator , u> 

1986, pp548-549, full Mine Architecture HW Emulator 

Disclosure Bulietin, March !994, pp605-606, "Real-Time CISC Mchi 

improvement Using An EMULATION Control B.^ , IBM T chn ^ ^ 

January .995, PP537-540 » « ~ ? ^ „ 

Instruction Se, Computer/Cycles Systems , IBM Technic 
,001 „r,23 1-234 "High Performance Dual Architecture Processor , 
1993, PP231 2 4 gn .. Syste m/370 I/O Channel Program Channel 

0 Disclosure Bulletin, August 1989, pp40 «, oy „„305-306 
7 d Word Prefetch- IBM Technical Disclosure Bulletin, June 1985, pp305 306, 
Command Word Prefetch ,1 Tech nical Disclosure Bulletin, 

. Tully Micrccode-Controiled Emulaton ■ ™ * . ^ ^ 

March 1,2, fT^^^T^M^-*. ° f ' — 

Disclosure Bulletin. August 1982, pp954 956, h Primitives Suitable for Coding 

IS With Most Freouently Used Instructions of Large System and P— Suita 

... ,;„„*••• IBM Technical Disclosure Bulletin, April 1983, pp55ft>»". 
Remaining Instructions , IBM leciinicai . ^ Furbsr and m e book 

"Emulation Inaction"; me b M k ARM System Architecture by S Furber and 
Computer Architecture: A Quantitative Approach by Hennessy and Patterson. 

Viewed from one aspect die present invention provides apparatus for processing d» 

saw ap ::::oTr: P erab,e . — . * — » 

inWLn set, said processor core having an — pipeline mto which instructions to 
executed are fetched from a memory and along which instructions progress; and 

an action translator operable » -slate mictions of a secon in~ « 
• ,0 translator output signals corresponding ,0 instructions of said fixs, instruction set, wherein 
^ "tTnLLnlsiator is within said instruction pipelme and trans,ates ins— 
of S aid second instraction set tha, have been fetched into said instruction pipeline from said 

J0 """'I ,eas, construction of said second instruction se, specifies a multi-step operation 
that quires a plurality of operations that may be specified by instructions of said 
instruction se. in order .0 be performed by said processor core; and 

said instruction translator is operable ,0 generate a sequence of translate 
signals to control said processor core .0 perform said multi-step operation. 
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The present invention provides the instruction translator within the instruction pipeline 
of the processor core itself downstream of the fetch stage. In this way, the non-native 
instructions (second instruction set instructions) may be stored within the memory system in 
the same way as native instructions (first instruction set instructions) thereby removing what 
would otherwise be a constraint on memory system usage. Furthermore, for each non-native 
instruction, a single memory fetch of a non-native instruction from the memory system takes 
place with generation of any multi-step sequence of native instruction operations occurring 
within the processor pipeline. This reduces the power consumed by memory fetches and 
improves performance. In addition, the instruction translator within the pipeline is able to 
issue a variable number of native instruction operations down the remainder of the pipeline to 
be executed in dependence upon the particular non-native instruction being decoded and in 
dependence upon any surrounding system state that may influence what native operations 
may efficiently perform the desired non-native operation. 

It will be appreciated that the instruction translator could generate translator output 
signals that fully and completely represent native instructions from the first instruction set. 
Such an arrangement may allow the simple re-use of hardware logic that was designed to 
operate with those instructions of the first instruction set. However, it will be appreciated that 
the instruction translator may also generate translator output signals that are control signals 
that can produce the same effect as native instructions without directly corresponding to them 
or^ additionally provide further operations, such as extended operand field, that were not in 
themselves directly provided by instructions of the first instruction set. 

Providing the instruction translator within the instruction pipeline enables a program 
counter value for the processor core to be used to fetch non-native instructions from the 
memory in a conventional manner as the translation into native instructions of non-native 
instructions takes place without reliance upon the memory organisation. Furthermore, the 
program counter value may be controlled so as to be advanced in accordance with the 
execution of non-native instructions without a dependence upon whether or not those non- 
native instructions translate into single step or multi-step operations of native instructions. 
Using the program counter value to track the execution of non-native instructions 
advantageously simplifies methods for dealing with interrupts, branches and other aspects of 
the system design. 
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Providing .he instruction translator within the instruction pipeline, in a way which 
may be considered as providing a finite state machine, has the resul, that the insmtcnon 
translator is more readily ahie to adjust the translated instruction operations to refle t tine 
sy s,=m state as well as the non-native instiuction being translated. As a particularly preferred 
example of mis, when the second instruction se, specifies stack based processing and the 
processor core is one intended for register based processing, then it is possible to use a set of 
me registers to effective!, cache stack operands in order to speed up processmg. In tins 
circumstance, the translated instruction sequences may vary depending upon whether or no, a 
particular stack operand is cached within a register or has to be fetched. 

in order to reduce the impact that the instruction translator may have upon the 
execution of native insttuctions, preferred embodiments are such mat the instruction relator 
within the instruction pipeline is provided with a bypass path such that, when operating , « a 
native instruction processing mode, native instructions can be processed without bemg 

influenced by the instruction translator. 

, t will be appreciated that the native instructions and the non-native instructions could 
take many different form, However, the invention is particularly usefi.1 when the non-native 
instructions of the second instruction se, are Java Virtual Machine instructions as me 
translation of these instructions into native inactions presems many of the problems and 

difficulties which the present invention is able to address. 
\ 

' Viewed from anoftew" the present invention provides a method of processing 
da* using a processor core'having an instruction pipeline in,o which instructions to be 
executed are fetched from a memory and along which instructions progress, sa,d processor 
core being operable ,0 execute operations specified by instructions of a firs, instruction set, 
said method comprising the steps of: 

fetching instructions into said instruction pipeline; and 

„ansla,ing fetched instructions of a second instruction se. in,o translator output s.gnals 
corresponding to instructions of said firs, instiuction se, using an instruction tians,a,or wtthtn 

said instruction pipeline; wherein 
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at least one instruction of said second instruction set specifies a multi-step operation 
that requires a plurality of operations that may be specified by instructions of said first 
instruction set in order to be performed by said processor core; and 

said instruction translator is operable to generate a sequence of translator output 
signals to control said processor core to perform said multi-step operation. 

The invention also provides a computer program product holding a computer program 
for controlling a computer in accordance with the above technique. 

When fetching instructions to be translated within an instruction pipeline a problem 
arises when the instructions to be translated are variable length instructions. The fetch stage 
of an instruction pipeline has relatively predictable operation when fetching fixed length 
instructions. For example, if an instruction is executed on each instruction cycle, then the 
fetch stage may be arranged to fetch an instruction upon each instruction cycle in order to 
keep the instruction pipeline full. However, when the instructions being fetched are of a 
variable length, then there is a difficulty in identifying the boundaries between instructions. 
Accordingly, in memory systems that provide fixed length memory reads, a particular 
variable length instruction may span between memory reads requiring a second fetch to read 
the final portion of an instruction. 

Viewed from another aspect the invention provides apparatus for processing data, said 
apparatus comprising: 

\ a processor core operable to execute operations as specified by instructions of a first 
instruction set, said processor core having an instruction pipeline into which instructions to be 
executed are fetched from a memory and along which instructions progress; and 

an instruction translator operable to translate instructions of a second instruction set 
into translator output signals corresponding to instructions of said first instruction set; wherein 

said instructions of said second instruction set are variable length instructions; 

said instruction translator is within said instruction pipeline and translates instructions 
of said second instruction set that have been fetched into a fetch stage of said instruction 
pipeline from said memory; and 

said fetch stage of said instruction pipeline includes an instruction buffer holding at 
least a current instruction word and a next instruction word fetched from said memory such 
that if a variable length instruction of said second instruction set starts within said current 



^28GB 9 ' 6" ^ 

action word and extends into said next instruction word, then said next instmction word 
uistructionw translation b y said instruction translator without requiring 

is available within said pipeline for translation oy saiu 

a further fetch operation. 

Tfce invention provide, a buffer wiftin the fetch stage storing at least a current 
action word and a next — n word. In this way, if a particular vanable^h 
action ex,ends ou, of the current instruction word into the next tnstiucuon w rd fte ft« 
action word has already been fetched and so is avaiiable for immedtate decode and us. 
LTTecond, power inefficient fetch is a.so avoided. It wiil be appreciated fta, P— 
, Z> stage in ft. pipeline tha, buffers a next instruction word as w=i. as the curren 
Ictil word and supports variable length instiuctions mafces the fetch stage operate ma 
Ire asynchronous manner relative ,0 the res, of fte stages within the .nstruction p,pe 
i; is Tounter to the normal operational trend within instruction pipelines for executing fixed 
length instructions in which the pipeline stages tend to operate in synchrony. 

Embodiments of the invention ft* buffer instructions within the fetch stage are wel, 
suit ed to use within systems that also have the above described preferred featitres set ou. ,n 
relation to the first aspect of the invention. 

Viewed from another aspect the invention provides a method of processing data using 
a processor core operable to execute operations as specified by instructions of a firs, 

executed are fetched from a memory and aiong which instructions progress, satd method 

comprising the steps of: 
« fetching instructions into said instruction pipeline; and 

translating fetched instructions of a second instruction set into translator output s.gnals 
corresponding to instiuctions of said firs, instruction set using an instruction translate, wtiun 

said instruction pipeline; wherein 

said instiuctions of said second instruction set are variable lengft instiuctions; 

said instiuction tia»sla,or is wUhin said instruction pipeline and tiansla.es tnstiucons 
of sai d second insnuction se, ft. have been fe,ched in,o a fe,ch suge of said mstiucon 

pipeline from said memory; and 

said fetch stage of said instiuction pipeline includes an inaction buffer holding 
.east a curren, instiuction word and a nex, instruction word fetched from said memory such 
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that if a variable length instruction of said second instruction set starts within said current 
instruction word and extends into said next instruction word, then said next instruction word 
is available within said pipeline for translation by said instruction translator without requiring 
a further fetch operation. 



Embodiments of the invention will now be described, by way of example only, with 
reference to the accompanying drawings in which: 

Figures 1 and 2 schematically represent example instruction pipeline arrangements; 



from within buffered instruction words within the fetch stage; 

15 

Figure 5 schematically illustrates a data processing system for executing both 
processor core native instructions and instructions requiring translation; 

Figure 6 schematically illustrates, for a sequence of example instructions and states 
20 the contents of the registers used for stack operand storage, the mapping states and the 
relationship between instructions requiring translation and native instructions; 

\ 

\ Figure 7 schematically illustrates the execution of a non-native instruction as a 
sequence of native instructions; 



Figure 8 is a flow diagram illustrating the way in which the instruction translator may 
operate in a manner that preserves interrupt latency for translated instructions;. 
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Figure 3 illustrates in more detail a fetch stage arrangement; 



Figure 4 schematically illustrates the reading of variable length non-native instructions 



25 



30 



Figure 9 schematically illustrates the translation of Java bytecodes into ARM opcodes 
using hardware and software techniques; 
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Figure 10 schematically illustrates the flow of control between a hardware based 
translator, a software based interpreter and software based scheduling; 

Figures 11 and 12 illustrate another way of controlling scheduling operations using a 
timer based approach; and 

Figure 13 is a signal diagram illustrating the signals controlling the operation of the 
circuit of Figure 12. 

Figure 1 shows a first example instruction pipeline 30 of a type suitable for use in an 
ARM processor based system. The instruction pipeline 30 includes a fetch stage 32, a native 
instruction (ARM/Thumb instructions) decode stage 34, an execute stage 36, a memory 
access stage 38 and a write back stage 40. The execute stage 36, the memory access stage 38 
and the write back stage 40 are substantially conventional. Downstream of the fetch stage 32, 
and upstream of the native instruction decode stage 34, there is provided an instructs 
translator stage 42. The instruction translator stage 42 is a finite state machine that translates 
Java bytecode instructions of a variable length into native ARM instructions. The instruction 
translator stage 42 is capable of multi-step operation whereby a single Java bytecode 
instruction may generate a sequence of ARM instructions that are fed along the remainder of 
the instruction pipeline 30 to perform the operation specified by the Java bytecode instruction. 
Simple Java bytecode instructions may required only a single ARM instruction to perform 
their operation, whereas more complicated Java bytecode instructions, or in circumstances 
where the surrounding system state so dictates, several ARM instructions may be needed to 
provide the operation specified by the Java bytecode instruction. This multi-step operation 
takes place downstream of the fetch stage 32 and accordingly power is not expended upon 
fetching multiple translated ARM instructions or Java bytecodes from a memory system. The 
Java bytecode instructions are stored within the memory system in a conventional manner 
such that additional constraints are not provided upon the memory system in order to support 
the Java bytecode translation operation. 

As illustrated, the instruction translator stage 42 is provided with a bypass path. When 
not operating in an instruction translating mode, the instruction pipeline 30 may bypass the 
instruction translator stage 42 and operate in an essentially unaltered manner to provide 
decoding of native instructions. 




In the instruction pipeline 30, the instruction translator stage 42 is illustrated as 
generating translator output signals that fully represent corresponding ARM instructions and 
are passed via a multiplexer to the native instruction decoder 34. The instruction translator 42 
also generates some extra control signals that may be passed to the native instruction decoder 
34. Bit space constraints within the native instruction encoding may impose limitations upon 
the range of operands that may be specified by native instructions. These limitations are not 
necessarily shared by the non-native instructions. Extra control signals are provided to pass 
additional instruction specifying signals derived from the non-native instructions that would 
not be possible to specify within native instructions stored within memory. As an example, a 
native instruction may only provide a relatively low number of bits for use as an immediate 
operand field within a native instruction, whereas the non-native instruction may allow an 
extended range and this can be exploited by using the extra control signals to pass the 
extended portion of the immediate operand to the native instruction decoder 34 outside of the 
translated native instruction that is also passed to the native instruction decoder 34. 

Figure 2 illustrates a further instruction pipeline 44. In this example, the system is 

provided with two native instruction decoders 46, 48 as well as a non-native instruction 

decoder 50. The non-native instruction decoder 50 is constrained in the operations it can 

specify by the execute stage 52, the memory stage 54 and the write back stage 56 that are 

provided to support the native instructions. Accordingly, the non-native instruction decoder 

50 must effectively translate the non-native instructions into native operations (which may be 
\ 

a single native operation or a sequence of native operations) and then supply appropriate 
control signals to the execute stage 52 to carry out these one or more native operations. It will 
be appreciated that in this example the non-native instruction decoder does not produce 
signals that form a native instruction, but rather provides control signals that specify native 
instruction (or extended native instruction) operations. The control signals generated may not 
match the control signals generated by the native instruction decoders 46, 48. 

In operation, an instruction fetched by the fetch stage 58 is selectively supplied to one 
of the instruction decoders 46, 48 or 50 in dependence upon the particular processing mode 
using the illustrated demultiplexer. 



Figure 3 schematically illustrates the fetch stage of an instruction pipeline in more 
detail. Fetching logic 60 fetches fixed length instruction words from a memory system and 
supplies these to an instruction word buffer 62. The instruction word buffer 62 is a swing 
buffer having two sides such that it may store both a current instruction word and a next 
instruction word. Whenever the current instruction word has been fully decoded and 
decoding has progressed onto the next instruction word, then the fetch logic 60 serves to 
replace the previous current instruction word with the next instruction word to be fetched 
from memory, i.e. each side of the swing buffer will increment by two in an interleaved 
fashion the instruction words that they successively store. 

In the example illustrated, the maximum instruction length of a Java bytecode 
instruction is three bytes. Accordingly, three multiplexers are provided that enable any three 
neighbouring bytes within either side of the word buffer 62 to be selected and supplied to the 
instruction translator 64. The word buffer 62 and the instruction translator 64 are also 
provided with a bypass path 66 for use when native instructions are being fetched and 
decoded. 

It will be seen that each instruction word is fetched from memory once and stored 
within the word buffer 62. A single instruction word may have multiple Java bytecodes read 
from it as the instruction translator 64 performs the translation of Java bytecodes into ARM 
instructions. Variable length translated sequences of native instructions may be generated 
without requiring multiple memory system reads and without consuming memory resource or 
imposing other constraints upon the memory system as the instruction translation operations 
are confined within the instruction pipeline. 



A program counter value is associated with each Java bytecode currently being 
translated. This program counter value is passed along the stages of the pipeline such that 
each stage is able, if necessary, to use the information regarding the particular Java bytecode 
it is processing. The program counter value for a Java bytecode that translates into a 
sequence of a plurality of ARM instruction operations is not incremented until the final ARM 
instruction operation within that sequence starts to be executed. Keeping the program counter 
value in a manner that continues to directly point to the instruction within the memory that is 
being executed advantageously simplifies other aspects of the system, such as debugging and 
branch target calculation. 
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Figure 4 schematically illustrates the reading of variable length Java bytecode 
instructions from the instruction buffer 62. At the first stage a Java bytecode instruction 
having a length of one is read and decoded. The next stage is a Java bytecode instruction that 
5 is three bytes in length and spans between two adjacent instruction words that have been 
fetched from the memory. Both of these instruction words are present within the instruction 
buffer 62 and so instruction decoding and processing is not delayed by this spanning of a 
variable length instruction between instruction words fetched. Once the three Java bytecodes 
have been read from the instruction buffer 62, the refill of the earlier fetched of the instruction 
10 words may commence as subsequent processing will continue with decoding of Java 
bytecodes from the following instruction word which is already present. 

The final stage illustrated in Figure 4 illustrates a second three bytecode instruction 
being read. This again spans between instruction words. If the preceding instruction word 

15 has not yet completed its refill, then reading of the instruction may be delayed by a pipeline 
stall until the appropriate instruction word has been stored into the instruction buffer 62. In 
some embodiments the timings may be such that the pipeline never stalls due to this type of 
behaviour. It will be appreciated that the particular example is a relatively infrequent 
occurrence as most Java bytecodes are shorter than the examples illustrated and accordingly 

20 two successive decodes that both span between instruction words is relatively uncommon. A 
valid signal may be associated with each of the instruction words within the instruction buffer 
62^ in a manner that is able to signal whether or not the instruction word has appropriately 
been refilled before a Java bytecode has been read from it. 

25 Figure 5 shows a data processing system 102 including a processor core 104 and a 

register bank 106. An instruction translator 108 is provided within the instruction path to 
translate Java Virtual Machine instructions to native ARM instructions (or control signals 
corresponding thereto) that may then be supplied to the processor core 104. The instruction 
translator 108 may be bypassed when native ARM instructions are being fetched from the 

30 addressable memory. The addressable memory may be a memory system such as a cache 
memory with further off-chip RAM memory. Providing the instruction translator 108 
downstream of the memory system, and particularly the cache memory, allows efficient use to 
be made of the storage capacity of the memory system since dense instructions that require 



nation may be stored within the memory system and only expanded into native 
instructions immediately prior to being passed to the processor core 104. 

The register bank 106 in this example contains sixteen general purpose 32-bit 
s registers, of which four are allocated for use in storing sfcck operands, i.e. the set of registers 
' for storing stack operands is registers RO, Rl, R2 and R3. 

The set of registers may be empty, partly filled with stack operands or completely 
fll ,ed with stack operands. The particular register that currently holds the top of stack 
operand may be any of the registers within the se, of registers. 1, wil, thus be apprecated drat 
L instruction translator may be in any one of seventeen different mapprng states 
corresponding to one state when all of the regis*, are empty and four groups o four^s 
each corresponding to a respective different number of stack operands being hel w«hm th 
set of registers and with a different register holding the top of stack operand. Table 1 
iHustra.es the seventeen different states of the state mapping for the instruction translator 10S. 
I, wiU be appreciated mat with a different number of registers allocated for stack operand 
st „rage or as a result of constraints that a particular processor core may have in the way * can 
.nanipulate dam values held within registers, me mapping states can very considerably 
depending upon the particular implementation and Table 1 is only given as an example of one 
20 particular implementation. 



10 



15 



25 



30 



35 



40 



STATE 00000 
\ 

R0 = \EMPTY 
Rl = EMPTY 
R2 = EMPTY 
R3 = EMPTY 

STATE 00100 

R0 = TOS 
Rl = EMPTY 
R2 = EMPTY 
R3 = EMPTY 

STATE 00101 

R0 = EMPTY 
Rl = TOS 
R2 « EMPTY 
R3 = EMPTY 

STATE 00110 



STATE 01000 

R0 = TOS 
Rl = EMPTY 
R2 = EMPTY 
R3 = TOS-1 

STATE 01001 

R0 = TOS-1 
Rl = TOS 
R2 = EMPTY 
R3 = EMPTY 

STATE 01010 



STATE 01100 STATE 10000 



R0 = TOS 
Rl - EMPTY 
R2 = TOS-2 
R3 = TOS-1 

STATE 01101 

R0 - TOS-1 
Rl = TOS 
R2 - EMPTY 
R3 = TOS-2 



R0 = TOS 
Rl = TOS-3 
R2 = TOS-2 
R3 = TOS-1 

STATE 10001 

R0 = TOS-1 
Rl - TOS 
R2 = TOS-3 
R3 = TOS-2 



STATE OHIO STATE 10010 
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RO = EMPTY 
Rl = EMPTY 
R2 = TOS 



RO = EMPTY 
Rl = TOS -I 
R2 * TOS 



RO = TOS-2 
Rl = TOS-1 
R2 = TOS 



RO = TOS-2 
Rl = TOS-1 
R2 = TOS 



R3 = EMPTY 



R3 = EMPTY 



R3 = EMPTY 



R3 = TOS -3 



STATE 00111 



STATE 01011 



STATE 01111 



STATE 10011 



RO = EMPTY 
Rl = EMPTY 
R2 = EMPTY 
R3 = TOS 



RO « EMPTY 
Rl = EMPTY 
R2 = TOS-1 
R3 = TOS 



RO = EMPTY 
Rl = TOS-2 
R2 = TOS-1 
R3 = TOS 



RO = TOS-3 
Rl = TOS-2 
R2 ~ TOS-1 
R3 = TOS 



TABLE 1 



Within Table 1 it may be observed that the first three bits of the state value indicate 
the number of non-empty registers within the set of registers. The final two bits of the state 
value indicate the register number of the register holding the top of stack operand. In this 
way, the state value may be readily used to control the operation of a hardware translator or a 
software translator to take account of the currently occupancy of the set of registers and the 
current position of the top of stack operand. 

As illustrated in Figure 5 a stream of Java bytecodes Jl, J2, J3 is fed to the instruction 
translator 108 from the addressable memory system. The instruction translator 108 then 
outputs a stream of ARM instructions (or equivalent control signals, possibly extended) 
dependent upon the input Java bytecodes and the instantaneous mapping state of the 
instruction translator 8, as well as other variables. The example illustrated shows Java 
bytecode Jl being mapped to ARM instructions A*l and A 1 2. Java bytecode J2 maps to 
ARM instructions A 2 1, A 2 2 and A 2 3. Finally, Java bytecode J3 maps to ARM instruction 
A 3 1. Each of the Java bytecodes may require one or more stack operands as inputs and may 
produce one or more stack operands as an output. Given that the processor core 104 in this 
example is an ARM processor core having a load/store architecture whereby only data values 
held within registers may be manipulated, the instruction translator 108 is arranged to 
generate ARM instructions that, as necessary, fetch any required stack operands into the set of 
registers before they are manipulated or store to addressable memory any currently held stack 
operands within the set of registers to make room for result stack operands that may be 
generated. It will be appreciated that each Java bytecode may be considered as having an 
associated "require full" value indicating the number of stack operands that must be present 
within the set of registers prior to its execution together with a "require empty" value 



Moating the number of empty registers within the set of registers tha, must be avaiiable 
prior to execution of the ARM instructions representing the Java opcode. 

Table 2 illustrates the relationship between initial mapping state values, require full 
values final state values and associated ARM instructions. The initial sate values and the 
final aate values correspond to the mapping states iUus.ra.ed in Tab!e 1. The instruction 
^ator !08 determines a require full value associated with me particular Java bytecode 
(opcode) it is translating. The instruction translator (108), in dependence upon the uutta 
mapping state that i. has, de.ermines whether or no, more stack operands need to be loaded 
into the se, of registers prior to executing the Java bytecode. Table 1 shows the iniua, srates 
together with tests applied to the require W value of ore Java bytecode tha, are together 
applied ,„ de<ermine whether a stack operand needs ,o be loaded i»,o <he se, of registering 
I associated ARM instruction (an LDR instruction) as weH as the final mapping state that 
will be adopted after such a stack cache load operation. In practice, if more titan one stack 
operand needs to be loaded into the se, of registers prior ,o execution of me Java bytecode, 
then multiple mapping state transitions win occur, each with an associated ARM in— 
.oading a stack operand into one of the registers of the se. of registers. In afferent 
embodiments it may be possible to load multiple stack operands in a single state ti— 
and accordingly make mapping state changes beyond those illustrated m Table 2. 

INITIAL 
STATE 
00000 

oqioo 

01001 
OHIO 
01111 . 
01100 
01101 
0 01010 
01011 
01000 
00110 

00111 >l "t"tt ;;* D - Dn ' f Rst ack, #-4] 

,5 00101 



AS will be seen from Table 2, a new stack operand loaded into the set of renters 
40 storing suck operands will form a new top of sack operand and mis will be loaded mto a 
particular one of the registers within me se, of regis,ers depending upon the mmal state. 



REQUIRE 


FINAL 


ACTIONS 


FULL 


STATE 


LDR R0, 


>0 


00100 


>1 


01000 


LDR R3, 


>2 


01101 


LDR R3, 


>3 


10010 


LDR R3, 


>3 


10011 


LDR R0, 


>3 


10000 


LDR Rl, 


>3 


10001 


LDR R2, 


>2 


OHIO 


LDR R0, 


>2 


01111 


LDR Rl. 


>2 


01100 


LDR R2, 


>1 


01010 


LDR Rl, 


>1 


01011 


LDR R2, 


>1 


01001 


LDR R0, 




TABLE 2 
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Table 3 in a similar manner illustrates the relationship between initial state, require 
empty value, final state and an associated ARM instruction for emptying a register within the 
set of registers to move between the initial state and the final state if the require empty value 
of a particular Java bytecode indicates that it is necessary given the initial state before the 
Java bytecode is executed. The particular register values stored off to the addressable 
memory with an STR instruction will vary depending upon which of the registers is the 
current top of stack operand. 



INITIAL 


REQUIRE 


FINAL 


ACTIONS 






STATE 


EMPTY 


STATE 










00100 


>3 


00000 


STR 


R0, 


[Rstack] , 


#4 


01001 


>2 


00101 


STR 


R0, 


[Rstack] , 


#4 


OHIO 


>1 


01010 


STR 


R0, 


[Rstack] , 


#4 


10011 


>0 


01111 


STR 


R0, 


[Rstack] , 


#4 


10000 


>0 


01100 


STR 


Rl, 


[Rstack] , 


#4 


10001 


>0 


01101 


STR 


R2, 


[Rstack] , 


#4 


10010 


>0 


OHIO 


STR 


R3, 


[Rstack] , 


#4 


01111 


>1 


01011 


STR 


Rl, 


[Rstack] , 


#4 


01100 


>1 


01000 


STR 


R2, 


[Rstack] , 


#4 


01101 


>1 


01001 


STR 


R3, 


[Rstack] , 


#4 


01010 


>2 


00110 


STR 


Rl, 


[Rstack] , 


#4 


01011 


>2 


00111 


STR 


R2, 


[Rstack] , 


#4 


01000 


>2 


00100 


STR 


R3, 


[Rstack] , 


#4 


00110 


>3 


00000 


STR 


R2, 


[Rstack] , 


#4 


00111 


>3 


00000 


STR 


R3, 


[Rstack] , 


#4 


00101 


>3 


00000 


STR 


Rl, 


[Rstack] , 


#4 



TABLE 3 

\ 

It will be appreciated that in the above described example system the require full and 
require empty conditions are mutually exclusive, that is to say only one of the require full or 
require empty conditions can be true at any given time for a particular Java bytecode which 
the instruction translator is attempting to translate. The instruction templates used by the 
instruction translator 108 together with the instructions it is chosen to support with the 
hardware instruction translator 108 are selected such that this mutually exclusive requirement 
may be met. If this requirement were not in place, then the situation could arise in which a 
particular Java bytecode required a number of input stack operands to be present within the 
set of registers that would not allow sufficient empty registers to be available after execution 
of the instruction representing the Java bytecode to allow the results of the execution to be 
held within the registers as required. 
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lt wiU be appreciated that a given Java bytecode wiU have an overall ne. stack action 
'ting ft. bice between the number of stack operands consumed and the number 
^ »P°" execution of ft- Java bytecode. Since the number of sta* 

stack operan * pxecuti on and the number of stack operands 

ooerands consumed is a requirement pnor to execution ana ■ 
T„I,ed is a requirement after execution, the require full and reoutre * - 
^1 with eal Java bytecode must be satisfied prior to execution of fta, bytecode ever^ 
^ net, overall action would in itself be me,. Table 4 illustrates the relationship be~ 
, 1 an overal! stack action, a final state and a change in regis.r use and reiative 
^ „ of the top of stack operand CTOS, « may be ft- one or more of the state 
lied in Table 2 or Table 3 need to be carried ou, prior to carrytng out the tate 
^ iUustiated in TaUe 4 in order to establish the preconditions for a ,ven Java 
bytecode depending onfte require Ml and require empty values* the Java bytecode. 
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INITIAL 
STATE 



00000 
00000 
20 00000 
00000 

00100 
00100 
25 00100 
00100 

0^-001 
01ND01 
30 01001 . 
01001 

OHIO 
OHIO 
35 OHIO 
OHIO 

10011 
10011 
40 10011 
10011 



STACK 


FINAL 


ACTION 


STATE 


+ 1 


00101 


+ 2 


01010 


+ 3 


01111 


+ 4 


10000 


+ 1 


01001 


+ 2 


OHIO 


+ 3 


10011 


-1 


00000 


+ 1 


OHIO 


+ 2 


10011 


-1 


00100 


-2 


00000 



+1 
-1 

-2 
-3 

-1 
-2 
-3 
-4 



ACTIONS 



Rl <- TOS 
Rl <- TOS-1, R2 < 
Rl <- TOS-2, R2 < 
R0 <- TOS, Rl <" TOS-3, 



TOS 

Tn q-1 R3 <- TOS 

K2 <- TOS-2, R3 <- TOS-1 



Rl <- TOS 

Rl <- TOS-1, R2 <- TOS 
Rl <- TOS-2, R2 < 
R0 <- EMPTY 



TOS-1, R3 <- TOS 



R2 <- TOS 

R2 <- TOS-1, R3 <- TOS 
Rl <- EMPTY 

R0 <- EMPTY, Rl <" EMPTY 

10011 R3 <- TOS 

01001 R2 <- EMPTY 

00100 Rl <~ EMPTY, R2 <- EMPTY 

00000 R0 <- EMPTY, Rl <- EMPTY , 



R2 <- EMPTY 



OHIO R3 <- EMPTY 
01001 R2 <- EMPTY, R3 <- 

R2 <- 



EMPTY 

EMPTY, R3 <- 



EMPTY 



00100 Rl <- EMPTY, ^ — EMPTY , R3 <- 
00000 R0 <- EMPTY, Rl <" EMPTY, R2 EMPTY 



45 



10000 
10000 
10000 
10000 



-1 

-2 
-3 
-4 



01111 R0 <- EMPTY 

01010 R0 <- EMPTY, R3 <- EMPTY 

00101 R0 <- EMPTY, 

00000 R0 <- EMPTY, 



R2 <- EMPTY, R3 <- EMPTY 
Rl <- EMPTY, R2 <- EMPTY, 



R3 <- 
EMPTY 



^*>828GB 



17 





10001 


-1 


01100 


Rl 


<- 


EMPTY 
















10001 


-2 


01011 


R0 


<- 


EMPTY, 


Rl 


<- 


EMPTY 










10001 


-3 


00110 


R0 


<- 


EMPTY, 


Rl 


<- 


EMPTY, 


R3 


<- 


EMPTY 


5 


10001 


-4 


00000 


R0 


<- 


EMPTY, 


Rl 


<- 


EMPTY, 


R2 


<- 


EMPTY 




10010 


-1 


01101 


R2 


<- 


EMPTY 
















10010 


-2 


01000 


Rl 


<- 


EMPTY, 


R2 


<- 


EMPTY 








10 


10010 


-3 


00111 


RO 


<- 


EMPTY, 


Rl 


<- 


EMPTY, 


R2 


<- 


EMPTY 




10010 


-4 


00000 


RO 


<- 


EMPTY, 


Rl 


<- 


EMPTY, 


R2 


<- 


EMPTY 




01111 


+1 


10000 


RO 


<- 


TOS 














15 


01111 


-1 


01010 


R3 


<- 


EMPTY 
















01111 


-2 


00101 


R2 


<- 


EMPTY, 


R3 


<- 


EMPTY 










01111 


-3 


00000 


Rl 


<- 


EMPTY, 


R2 


<- 


EMPTY, 


R3 


<- 


EMPTY 




01100 


+1 


10001 


Rl 


<- 


TOS 














20 


01100 


-1 


01011 


RO 


<- 


EMPTY 
















01100 


-2 


00110 


RO 


<- 


EMPTY, 


R3 


<- 


EMPTY 










01100 


-3 


00000 


RO 


<- 


EMPTY, 


R2 


<- 


EMPTY, 


R3 


<- 


EMPTY 




01101 


+1 


10010 


R2 


<- 


TOS 














25 


01101 


-1 


01000 


Rl 


<- 


EMPTY 
















01101 


-2 


00111 


RO 


<- 


EMPTY, 


Rl 


<- 


EMPTY 










01101 


-3 


00000 


RO 


<- 


EMPTY, 


Rl 


<- 


EMPTY, 


R3 


<- 


EMPTY 




01010 


+1 


01111 


R3 


<- 


TOS 














30 


01010 


+ 2 


10000 


R3 


<- 


TOS-1, 


RO 


<- 


TOS 










01010 


-1 


00101 


R2 


<- 


EMPTY 
















01010 


-2 


00000 


Rl 


<- 


EMPTY, 


R2 


<- 


EMPTY 










01011 


+ 1 


01100 


RO 


<- 


TOS 














35 


01011 


+ 2 


10001 


RO 


<- 


TOS-1, 


Rl 


<- 


TOS 










01011 


-1 


00110 


R3 


<- 


EMPTY 
















01011 


-2 


00000 


R2 


<- 


EMPTY, 


R3 


<- 


EMPTY 










01000 


+ 1 


01101 


Rl 


<- 


TOS 














40 


o\poo 


+2 


10010 


Rl 


<- 


TOS-1, 


R2 


<- 


TOS 










010O0 


-1 


00111 


RO 


<- 


EMPTY 
















01000 


-2 


00000 


RO 


<- 


EMPTY, 


R3 


<- 


EMPTY 










00110 


+ 1 


01011 


R3 


<- 


TOS 














45 


00110 


+ 2 


01100 


RO 


<- 


TOS, R3 


<- 


- TOS-1 










00110 


+ 3 


10001 


Rl 


<- 


TOS, RO 


<- 


■ TOS-1, R3 


<- 


■ TOS-2 




00110 


-1 


00000 


R2 


<- 


EMPTY 
















00111 


+ 1 


01000 


RO 


<- 


TOS 














50 


00111 


+ 2 


01101 


RO 


<- 


TOS-1, 


Rl 


<- 


TOS 










00111 


+ 3 


10010 


RO 


<- 


TOS-2, 


Rl 


<- 


TOS-1, 


R2 


<- 


TOS 




00111 


-1 


00000 


R3 


<- 


EMPTY 
















00101 


+ 1 


01010 


R2 


<- 


TOS 














55 


00101 


+ 2 


01111 


R2 


<- 


TOS-1, 


R3 


<- 


TOS 










00101 


+ 3 


10000 


R2 


<- 


TOS-2, 


R3 


<- 


TOS-1, 


Rl 


<- 


TOS 




00101 


-1 


00000 


Rl 


<- 


EMPTY 















R3 <- 
EMPTY 



EMPTY 



TABLE 4 



60 



5 



It will be appreciated that the relationships between states and conditions illustrated in 
Table 2, Table 3 and Table 4 could be combined into a single state transition table or state 
diagram, but they have been shown separately above to aid clarity. 

The relationships between the different states, conditions, and nett actions may be 
used to define a hardware state machine (in the form of a finite state machine) for controlling 
this aspect of the operation of the instruction translator 108. Alternatively, these relationships 
could be modelled by software or a combination of hardware and software. 

There follows below an example of a subset of the possible Java bytecodes that 
indicates for each Java bytecode of the subset the associated require full, require empty and 
stack action values for that bytecode which may be used in conjunction with Tables 2, 3 and 



4. 



15 iconst_0 

Operation: Push int constant 



20 



25 



Stack: . . . -> 



Require-Full = 0 
Require-Empty = 1 
Stack-Action = +1 



iadd 



Operation: Add int 

30 Stack: valuel, value2 => 

. . . , result 

Require-Full = 2 
Require-Empty = 0 
35 Stack-Action = -1 

lload_0 

Operation: Load long from. local variable 



40 



Stack* .»• ^ 

. .., value. wordl/ value. word2 



Require-Full ~ 0 
4 5 Require-Empty = 2 

Stack-Action = +2 

lastore 

50 Operation: Store into long array 
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Stack: 



19 



. , arrayref/ index, value. wordl, value. word2 => 



10 



15 



20 



25 



land 

Operation 
Stack: 

value2.word2 => 



iastore 

Operation : 
Stack: 



Require-Full = 4 
Require-Empty = 0 
Stack-Action = -4 



Boolean AND long 

. .., valuel . wordl, valuel . word2, value2 . wordl , 

result . wordl, result. word2 

Require-Full = 4 
Require-Empty = 0 
Stack-Action = -2 

Store into int array 

. . . , arrayref, index, value => 



30 



35 



40 



meg 

Operation : 
Stack: 

\ 



Require-Full = 3 
Require-Empty = 0 
Stack-Action « -3 



Negate int 

. . . , value => 
- . . , result 

Require-Full = 1 
Require-Empty = 0 
Stack-Action = 0 



There also follows example instruction templates for each of the Java bytecode 
45 instructions set out above. The instructions shown are the ARM instructions which 
implement the required behaviour of each of the Java bytecodes. The register field "TOS-3", 
"TOS-2", "TOS-1", "TOS", "TOS+1" and "TOS+2" may be replaced with the appropriate 
register specifier as read from Table 1 depending upon the mapping state currently adopted. 
The denotation "TOS+n" indicates the Nth register above the register currently storing the top 
50 of stack operand starting from the register storing the top of stack operand and counting 
upwards in register value until reaching the end of the set of registers at which point a wrap is 
made to the first register within the set of registers. 



iconst_0 



MOV 



tos+1, #0 

LDR tos-2, [vars, #4] 

lload_0 LDR tos+1, [vars, #uj 

5 LDR Rtmp2, [tos-2 , Ml 

iastore ldr Rtmpl, [tos— 2, t^R #5 

tos-1, Rtmp2, LSR »3 



CMP 



10 



BLXCS Rexc #21 

STR tos, [Rtmpl, tos j., 

LDR Rtmp2, [tos-3, HI 

lastore ldr Rtmpl, Itos-S.^tOJ ^ 

CMP 



Rtmpl, iw= jlc 
tos-2, Rtmp2, LSR ts 



BLXCS Rexc LSL #3]! 

15 STR tos-1, [Rtmpl, tos 2, 

STR tos, [Rtmpl, #41 



ADD 

iadd 

20 RSB 
ineg 



tos-1, tos-1, tos 
tos, tos, #0 
tos-2, tos-2, tos 



land mD tos-3, tos-3, tos 



25 



30 



35 



40 



• nwrated below of a single Java bytecode 

- example execution seance ~ ^ „ descnbed 

execu ted by a hardware translate umU « «« ^ ^ & 

above . The execution sequence » shown m erms o & rf 

sequen ce of states dependent upon the »~ , ^ ^ 

ARM instructions as a result of the acuons of ^ instructions . 
who le having the effect oftranslating a Java bytecode to a sequence 

r a 0 d 0 d ° ( Re q uire-rull-2, Re q uire-Em P t y =0, Stack-*ction=- 

Instruction: 

Condition: Re ^r 0 r 1>0 >0 00100 

State Transition: 00UUU 

ARM instruction (s) : ld r r0, tRstack, #-4]. 

<- «i-*te- 00100 v ~ Rea uire-Empty=0, Stack-Action=- 

Next state. ( Re quire-Full=2 , Requir 

Instruction: 

Condition: .^„ Re ^ o e i0 o UU>1 >1 01000 



45 



State Transition: 
ARM instructions (s) : LDR R3, [Rstack, #-41 



«. ., at e- 0 1000 «• n-7 Require-Empty=0, Stack-Action- 

Next state. iadd (Requir e-Full-2, Requir 

Instruction: 

50 Condition: "^S^" 1 -! 00111 

State Transition: 01000 
instruction template^ tQg 
ARM Instructions (s) (after substitution) : 



A# 21 

V.-'--;* 

ADD R3, R3, RO 

Next state: 00111 



5 Figure 6 illustrates in a different way the execution of a number of further Java 

bytecode instructions. The top portion of Figure 6 illustrates the sequence of ARM 
instructions and changes of mapping states and register contents that occur upon execution of 
an iadd Java bytecode instruction. The initial mapping state is 00000 corresponding to all of 
the registers within the set of registers being empty. The first two ARM instructions 
10 generated serve to POP two stack operands into the registers storing stack operands with the 
top of stack "TOS" register being R0. The third ARM instruction actually performs the add 
operation and writes the result into register R3 (which now becomes the top of stack operand) 
whilst consuming the stack operand that was previously held within register Rl, thus 
producing an overall stack action of -1 . 

15 

Processing then proceeds to execution of two Java bytecodes each representing a long 
load of two stack operands. The require empty condition of 2 for the first Java bytecode is 
immediately met and accordingly two ARM LDR instructions may be issued and executed. 
The mapping state after execution of the first long load Java bytecode is 01 101. In this state 

20 the set of registers contains only a single empty register. The next Java bytecode long load 
instruction has a require empty value of 2 that is not met and accordingly the first action 
required is a PUSH of a stack operand to the addressable memory using an ARM STR 
instruction. This frees up a register within the set of registers for use by a new stack operand 
which may then be loaded as part of the two following LDR instructions. As previously 

25 mentioned, the instruction translation may be achieved by hardware, software, or a 
combination of the two. Given below is a subsection of an example software interpreter 
generated in accordance with the above described techniques. 



Interpret LDRB Rtmp, [Rjpc, #1] ! 

30 LDR p C/ [pc, Rtmp, lsl #2] 

DCD 0 

DCD do_iconst__0 ; Opcode 0x03 

35 DCD do_lload_0 ; Opcode Oxle 

DCD do_iastore ; Opcode 0x4 f 

DCD do_lastore ; Opcode 0x50 

40 DCD do_iadd ; Opcode 0x60 
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DCD 
DCD 



• Opcode 0x7 4 

do_ineg 

; Opcode 0x7 f 
do_land ' ^ 

MOV R0, #0 

5 do_iconstJ> gTR R0< [Rstackl , #4 

o interpret 

LDMIA Rvars, (RO , 

doj.load.0 STMIA Rstackl, (RO, W> 

o interpret 

10 LDMDB Rstac.i, <«>. *1. 

do_iastore Rtmp2 , rO, #4 

LDR Rtmpl, IrO. #01 
rMP Rl, Rtmp2, LSR 

do lastore Rtmp2 , rO, #4 

LDR Rt«PL trO, *01 

tmp Rl. Rtmp2. LSR * 3 

is sissrarjs."'- 

STR R3, [Rtmpl, #41 

B Interpret 

25 LDMDB Rstack! , (r0, ri» 

do iadd ADD r o, rO, rl 

STR rO, [Rstackl, #4 

B interpret 

LDR rO. t RstaC i' # 41 " 

30 do_ineg rsb tos, tos, *u 

SIR rO, [Rstack], *« 

B Interpret 3) 

LDMDB Rstack'-, U0, rl. r2,. 

do_land Am rl# rl, « 

STMIA Rstack!, UO, 
B Interpret 



15 



20 CMP 



S ^te_00000__Interpret LDRB ^ ^ Rtmp , lsi » 2 ] 

N DCD 0 

State 00000_do_iconst_0 , Opcode 0x03 

State_00000_do_lload_0 ; Opcode Oxle 

. *- nr -o • Opcode 0x4 f 

State_00000 do lastore , V ^ 

State_00000_do_lastore 

. AA ■ Opcode 0x60 

State_00000_do_iadd - P 

. Q „ ; Opcode 0x7 4 

State_00000_do_meg - v 

State"_00000_do_i a nd ; Opcode 0x7 f 



40 

DCD 

45 DCD 

DCD 
DCD 

50 DCD 

DCD 
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DCD 

Rl, *0 



60 



State J)0000_do_iconstJ> MOV £; te j> 0 101 interpret 

State OOOOO^do.lload.O LDMIA R«rs. ^ f n Urpret 

TDMDB Rstack!, (R0, Rl' R2} 
State J)0000_do_iastore LDMDB 



7>. 
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State_00000_do_lastore 

10 

15 State_00000_do_iadd 
State_00000_do_ineg 

20 

State_00000_do_land 

25 



45 



50 
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LDR Rtmp2, [rO, #4] 

LDR Rtmpl, [rO, #0] 

CMP Rl, Rtmp2, LSR #5 

BCS ArrayBoundException 

STR R2, [Rtmpl, Rl, LSL #2] 

B State_00O00_Interpret 

LDMDB Rstack!, {RO, Rl, R2, R3 } 

LDR Rtmp2, [rO, #4] 

LDR Rtmpl, [rO, #0] 

CMP Rl, Rtmp2, LSR #5 

BCS ArrayBoundException 

STR R2, [Rtmpl, Rl, LSL #3] ! 

STR R3, [Rtmpl, #4] 

B State_00000_lnterpret 

LDMDB Rstack!, {Rl, R2 } 

ADD rl, rl, r2 

B State_00101_Interpret 

LDR rl, [Rstack, #-4] ! 

RSB rl, rl, #0 

B State_00101_Interpret 

LDR rO, [Rstack, #-4] ! 

LDMDB Rstack!, (rl, r2, r3} 

AND r2, r2, rO 

AND rl, rl, r3 

B State_01010_Interpret 





State_00100_Interpret 


LDRB 


Rtmp, [Rjpc, #1] • 










LDR 


pc, [pc, Rtmp, lsl #2] 










DCD 


0 






30 
















DCD 


State_00100_do_iconst_0 


; Opcode 


0x03 






DCD 


S t a t e_0 010 0_do_l 1 oad_0 


; Opcode 


Oxle 


35 




DCD 


S tat e_00100_do_ias tore 


; Opcode 


0x4f 






DCD 


Stat e_0 010 0_do_l a s t o r e 


; Opcode 


0x50 






DCD 


S t a t e_0 010 0_do_i add 


; Opcode 


0x60 


40 


\ 


DCD 


State_00100_do_ineg 


; Opcode 


0x74 






DCD 


State 00100_do_land 


; Opcode 


0x7f 



State_00100_ 
State_00100_ 
State 00100 



do iastore 



55 State 00100 do iastore 
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cio_iconst_0 MOV 
B 

_do_lload_0 LDMIA 
B 

LDMDB 
LDR 
LDR 
CMP 
BCS 
STR 
B 

LDMDB 
LDR 
LDR 
CMP 
BCS 
STR 
STR 



Rl, #0 

State_01001_Interpret 
Rvars, {rl, R2 } 
State_01110_Interpret 
Rstack! , {r2, r3 } 
Rtmp2, [r2, #4] 
Rtmpl, [r2, #0] 
R3, Rtmp2, LSR #5 
ArrayBoundException 
R0, [Rtmpl, R3, lsl #2] 
State_00000_lnterpret 
Rstack!, {rl, r2, r3} 
Rtmp2, [rl, #4] 
Rtmpl, [rl, #0] 
r2, Rtmp2, LSR #5 
ArrayBoundException 
r3, [Rtmpl, r2, lsl #3] 
rO, [Rtmpl, #4] 



<0£ 



State_OOlOO_do_iadd 

5 state J)0100jio_ineg 
State_00100_do_land 



10 



15 



8 

LDR 
ADD 
B 

RSB 
B 

LDMDB 
AND 
AND 
B 



24 

State J>0000_lnterpret 
r3, [Rstack, *-4]S 
r3, r3, rO 

Statej)0111_lnterpret 
rO, rO, #0 

State J)010Q_Interpret 
Rstack!, Irl, r2, r3> 
r2, r2, rO 
rl, rl, ^3 

State J>1010_Interpret 



State_01000_lnterpret 



20 



25 



LDRB 
LDR 


Rtmp, [R3P C ' 111 • 49 , 
pc, IPC Rtmp, isl #21 




DCD 


0 




DCD 


State JU000_do_iconst_0 


; Opcode 0x03 


DCD 


State_01000_do_lloadJ) 


; Opcode Oxle 


DCD 
DCD 


State 01000 do_iastore 
State J>1000_doJ.astore 


; Opcode 0x4 f 
; Opcode 0x50 


DCD 


State_01000_do__iadd 


; Opcode 0x60 


DCD 


State J)1000_do_ineg 


; opcode 0x7 4 


DCD 


State_01000_do_land 


; Opcode 0x7 f 



30 



35 



State J>1000_do_iconst_0 
State jn000__do__lload__0 
State_01000_do_iastore 



40 S Wte_0l000,do_lastore 



45 

State_01000_do_iadd 
50 state JU000_do_ineg 
State_01000__do_land 



55 



60 



MOV 
B 

LDMIA 
B 

LDR 
LDR 
LDR 
CMP 
BCS 
STR 
B 

LDMDB 

LDR 

LDR 

CMP 

BCS 

STR 

STR 

B 

ADD 
B 

RSB 
B 

LDMDB 
AND 
AND 
B 



Rl, #0 

State_01101_lnterpret 

Rvars, {rl, r21 
State_10010_lnterpret 

rl, [Rstack, #-41 I 
Rtmp2, [R3, #4] 
Rtmpl, [R3, #01 
rO, Rtmp2, LSR #5 
ArrayBoundException 

rl, [Rtmpl, rO, Isl #2] 
State_00000__lnterpret 

Rstackl, (rl, r21 
Rtmp2, <r3, #41 
Rtmpl, (R3, #01 
rO, Rtmp2, LSR #5 
ArrayBoundException 

rl, [Rtmpl, rO, Isl #31 . 

r2, [Rtmpl, #41 

State J)0000_lnterpret 

r3, r3, rO 

State_00111_lnterpret 
rO, rO f #0 

State_01000_lnterpret 
Rstackl, (rl, r2) 
RO, RO, 
R3, R3, Rl 

State OlOOOJLnterpret 



State 0H00_Interpret 
State"l0000_lnterpret 
State 00l01_lnterpret 
State_01001__lnterpret 
State__01101_lnterpret 
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State_10001_Interpret 
State_00110_Interpret 
State_01010_Interpret 
State_01110_Interpret 
State__10010_Interpret ... 
State_00111_Interpret 
State_01011_Interpret ... 
State_01111_Interpret ... 
State_10011_Interpret 

Figure 7 illustrates a Java bytecode instruction "laload" which has the function of 
reading two words of data from within a data array specified by two words of data starting at 
the top of stack position. The two words read from the data array then replace the two words 
that specified their position and to form the topmost stack entries. 

In order that the "laload" instruction has sufficient register space for the temporary 
storage of the stack operands being fetched from the array without overwriting the input stack 
operands that specify the array and position within the array of the data, the Java bytecode 
instruction is specified as having a require empty value of 2, i.e. two of the registers within 
the register bank dedicated to stack operand storage must be emptied prior to executing the 
ARM instructions emulating the "laload" instruction. If there are not two empty registers 
when this Java bytecode is encountered, then store operations (STRs) may be performed to 
PUSH stack operands currently held within the registers out to memory so as to make space 
for the temporary storage necessary and meet the require empty value for the instruction. 

The instruction also has a require full value of 2 as the position of the data is specified 
by an array location and an index within that array as two separate stack operands. The 
drawing illustrates the first state as already meeting the require full and require empty 
conditions and having a mapping state of "01001". The "laload" instruction is broken down 
into three ARM instructions. The first of these loads the array reference into a spare working 
register outside of the set of registers acting as a register cache of stack operands. The second 
instruction then uses this array reference in conjunction with an index value within the array 
to access a first array word that is written into one of the empty registers dedicated to stack 
operand storage. 

It is significant to note that after the execution of the first two ARM instructions, the 
mapping state of the system is not changed and the top of stack pointer remains where it 
started with the registers specified as empty still being so specified. 
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then step 21 serves to update the program counter value to point to the next Java bytecode in 
the sequence of instructions to be executed. It will be understood that if the ARM instruction 
is the final instruction, then it will complete its execution irrespective of whether or not an 
interrupt now occurs and accordingly it is safe to update the program counter value to the next 
Java bytecode and restart execution from that point as the state of the system will have 
reached that matching normal, uninterrupted, full execution of the Java bytecode. If the test 
at step 20 indicates that the final bytecode has not been reached, then updating of the program 
counter value is bypassed. 

Step 22 executes the current ARM instruction. At step 24 a test is made as to whether 
or not there are any more ARM instructions that require executing as part of the template. If 
there are more ARM instructions, then the next of these is selected at step 26 and processing 
is returned to step 20. If there are no more instructions, then processing proceeds to step 28 at 
which any mapping change/swap specified for the Java bytecode concerned is performed in 
order to reflect the desired top of stack location and full/empty status of the various registers 
holding stack operands. 



Figure 8 also schematically illustrates the points at which an interrupt if asserted is 
serviced and then processing restarted after an interrupt. An interrupt starts to be serviced 
after the execution of an ARM instruction currently in progress at step 22 with whatever is the 
current program counter value being stored as a return point with the bytecode sequence. If 
the current ARM instruction executing is the final instruction within the template sequence, 
then step 21 will have just updated the program counter value and accordingly this will point 
to the next Java bytecode (or ARM instruction should an instruction set switch have just been 
initiated). If the currently executing ARM instruction is anything other than the final 
instruction in the sequence, then the program counter value will still be the same as that 
indicated at the start of the execution of the Java bytecode concerned and accordingly when a 
return is made, the whole Java bytecode will be re-executed. 

Figure 9 illustrates a Java bytecode translation unit 68 that receives a stream of Java 
bytecodes and outputs a translated stream of ARM instructions (or corresponding control 
signals) to control the action of a processor core. As described previously, the Java bytecode 
translator 68 translates simple Java bytecodes using instruction templates into ARM instructions 
or sequences of ARM instructions. When each Java bytecode has been executed, then a counter 
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Figure 1 1 illustrates an alternative control arrangement. At the start of processing at step 
86 an instruction signal (scheduling signal) is deasserted. At step 88, a fetched Java bytecode is 
examined to see if it is a simple bytecode for which hardware translation is supported. If 
5 hardware translation is not supported, then control is passed out to the interpreting software at 
step 90 which then executes a ARM instruction routine to interpret the Java bytecode. If the 
bytecode is a simple one for which hardware translation is supported, then processing proceeds 
to step 92 at which one or more ARM instructions are issued in sequence by the Java bytecode 
translation unit 68 acting as a form of multi-cycle finite state machine. Once the Java bytecode 
10 has been properly executed either at step 90 or at step 92, then processing proceeds to step 94 at 
which the instruction signal is asserted for a short period prior to being deasserted at step 86. 
The assertion of the instruction signal indicates to external circuitry that an appropriate safe point 
has been reached at which a timer based scheduling interrupt could take place without risking a 
loss of data integrity due to the partial execution of an interpreted or translated instruction. 



Figure 12 illustrates example circuitry that may be used to respond to the instruction 
signal generated in Figure 11. A timer 96 periodically generates a timer signal after expiry of a 
given time period. This timer signal is stored within a latch 98 until it is cleared by a clear timer 
interrupt signal. The output of the latch 98 is logically combined by an AND gate 100 with the 

20 instruction signal asserted at step 94. When the latch is set and the instruction signal is asserted, 
then an interrupt is generated as the output of the AND gate 100 and is used to trigger an 
interrupt that performs scheduling operations using the interrupt processing mechanisms 
provided within the system for standard interrupt processing. Once the interrupt signal has been 
generated, this in turn triggers the production of a clear timer interrupt signal that clears the latch 

25 98 until the next timer output pulse occurs. 

Figure 13 is a signal diagram illustrating the operation of the circuit of Figure 12. The 
processor core clock signals occur at a regular frequency. The timer 96 generates timer signals at 
predetermined periods to indicate that, when safe, a scheduling operation should be initiated. 
30 The timer signals are latched. Instruction signals are generated at times spaced apart by intervals 
that depend upon how quickly a particular Java bytecode was executed. A simple Java bytecode 
may execute in a single processor core clock cycle, or more typically two or three, whereas a 
complex Java bytecode providing a high level management type function may take several 
hundred processor clock cycles before its execution is completed by the software interpreter. In 
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CLAIMS 



1 . Apparatus for processing data, said apparatus comprising: 

a processor core operable to execute operations as specified by instructions of a first 
instruction set, said processor core having an instruction pipeline into which instructions to be 
executed are fetched from a memory and along which instructions progress; and 

an instruction translator operable to translate instructions of a second instruction set 
into translator output signals corresponding to instructions of said first instruction set; wherein 

said instruction translator is within said instruction pipeline and translates instructions 
of said second instruction set that have been fetched into said instruction pipeline from said 
memory; 

at least one instruction of said second instruction set specifies a multi-step operation 
that requires a plurality of operations that may be specified by instructions of said first 
instruction set in order to be performed by said processor core; and 

said instruction translator is operable to generate a sequence of translator output 
signals to control said processor core to perform said multi-step operation. 

2. Apparatus as claimed in claim 1, wherein said translator output signals include signals 
forming an instruction of said first instruction set. 

3. Apparatus as claimed in any one of claims 1 and 2, wherein said translator output 
signals include control signals that control operation of said processor core and match control 
signals produced on decoding instructions of said first instruction set. 

4. Apparatus as claimed in any one of claims 1, 2 and 3, wherein said translator output 
signals include control signals that control operation of said processor core and specify 
parameters not specified by control signals produced on decoding instructions of said first 
instruction set. 

5. Apparatus as claimed in any one of the preceding claims, wherein said processor core 
fetches instructions from an instruction address within said memory specified by a program 
counter value held by said processor core. 
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14. Apparatus as claimed in any one of the preceding claims, wherein said instructions of 
said second instruction set are Java Virtual Machine bytecodes. 

15. A method of processing data using a processor core having an instruction pipeline into 
which instructions to be executed are fetched from a memory and along which instructions 
progress, said processor core being operable to execute operations specified by instructions of 
a first instruction set, said method comprising the steps of: 

fetching instructions into said instruction pipeline; and 

translating fetched instructions of a second instruction set into translator output signals 
corresponding to instructions of said first instruction set using an instruction translator within 
said instruction pipeline; wherein 

at least one instruction of said second instruction set specifies a multi-step operation 
that requires a plurality of operations that may be specified by instructions of said first 
instruction set in order to be performed by said processor core; and 

said instruction translator is operable to generate a sequence of translator output 
signals to control said processor core to perform said multi-step operation. 

16. A computer program product holding a computer program for controlling a computer 
to perform the method of claim 13. 



17. Apparatus for processing data, said apparatus comprising: 

a processor core operable to execute operations as specified by instructions of a first 
instruction set, said processor core having an instruction pipeline into which instructions to be 
executed are fetched from a memory and along which instructions progress; and 

an instruction translator operable to translate instructions of a second instruction set 
into translator output signals corresponding to instructions of said first instruction set; wherein 

said instructions of said second instruction set are variable length instructions; 

said instruction translator is within said instruction pipeline and translates instructions 
of said second instruction set that have been fetched into a fetch stage of said instruction 
pipeline from said memory; and 

said fetch stage of said instruction pipeline includes an instruction buffer holding at 
least a current instruction word and a next instruction word fetched from said memory such 
that if a variable length instruction of said second instruction set starts within said current 
instruction word and extends into said next instruction word, then said next instruction word 
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said instruction translator is within said instruction pipeline and translates instructions 
of said second instruction set that have been fetched into a fetch stage of said instruction 
pipeline from said memory; and 

said fetch stage of said instruction pipeline includes an instruction buffer holding at 
least a current instruction word and a next instruction word fetched from said memory such 
that if a variable length instruction of said second instruction set starts within said current 
instruction word and extends into said next instruction word, then said next instruction word 
is available within said pipeline for translation by said instruction translator without requiring 
a further fetch operation. 

25. A computer program product holding a computer program for controlling a computer 
to perform the method of claim 24. 

26. Apparatus for data processing substantially as hereinbefore described with reference to 
15 the accompanying drawings. 

27. A method of data processing substantially as hereinbefore described with reference to 
the accompanying drawings. 
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28. A computer program product holding a computer program for controlling a computer 
to perform a method substantially as hereinbefore described with reference to the 
accompanying drawings. 
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[Figure 3] 
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